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A  COMPARATIVE  POWER  STUDY  OF  THE  BIVARIATE  RANK  SUM 

TEST  AND  T2 
by 

G.  K.  Bhattacharyya  ,  Richard  A.  Johnson 

and 

H.  R.  Neave 

The  small  sample  performance  of  the  bivariate  two-sample 

Wilcoxon  type  rank  sum  test  W  is  studied  by  Monte  Carlo  evaluation 

of  power  under  shift  alternatives  in  the  bivariate  normal  distribution. 

Two  types  of  alternatives  are  considered:  (a)  shift  in  only  one 

coordinate  and  (b)  equal  amounts  of  shifts  in  both  coordinates. 

The  estimated  power  of  the  W  test  is  compared  with  the  power 
2 

of  Hotelling's  T  test.  Interestingly,  it  is  found  that  the  em- 

2 

pirical  power  of  W  substantially  exceeds  the  power  of  T  for 
some  normal  shift  alternatives  although  the  latter  is  the  uniformly 
most  powerful  invariant  test  for  the  problem. 

1.  INTRODUCTION 

The  present  work  is  devoted  to  an  empirical  study  of  the  performance 

of  a  Wilcoxon  type  rank  sum  test,  for  bivariate  shift,  introduced  by  Chatterjee 

and  Sen  [4],  The  main  emphasis  is  on  the  comparison  of  its  power  with  that 
2 

of  Hotelling's  T  test  for  bivariate  normal  shift  alternatives.  Although  the 
test  is  an  extension  of  the  univariate  Wilcoxon  test,  its  application  involves 
a  complex  conditioning  aspect  which  is  absent  in  the  univariate  case  and  may 
have  segregated  it  from  the  domain  of  applied  statistics.  Therefore  the  test 
is  described  in  Section  2  and  its  use  illustrated. 


i 


-2- 


Nothing  whatever  is  known  about  the  small  sample  power  of  the  bivariate 

Wilcoxon  test  W  (it  is  the  test  R  of  Chatterjee  of  Sen  [4]).  The  computation 

of  exact  power  Is  extremely  difficult  due  to  the  involvement  of  the  probabilites 

of  bivariate  rank  configurations.  Consequently  we  employ  Monte  Carlo 

simulation  to  estimate  the  power  for  several  bivariate  normal  alternatives 

2 

and  compare  the  results  with  the  exact  power  of  the  T  test. 

The  Pitman  asymptotic  relative  efficiency  (ARE)  of  W  with  respect  to 

2 

T  depends  on  the  direction  in  which  the  shift  occurs  as  well  as  on  the 
correlation.  Bounds  on  the  ARE  for  normal  and  other  bivariate  distributions 
have  been  studied  by  Bickel  [2]  and  Bhattacharyya  and  Johnson  [  1],  This 
large  sample  measure  essentially  reflects  the  relative  performance  of  the  tests 
under  local  alternatives  and  is  not  always  a  satisfactory  guide  to  the  compara¬ 
tive  power  in  small  or  moderate  samples.  The  ARE  values  corresponding  to 
the  alternatives  considered  for  empirical  power  are  presented  in  Section  3 
for  the  purpose  of  showing  the  manner  in  which  the  ARE  reflects  power. 

It  is  found  that  over  most  of  the  alternatives  considered,  the  empirical 

2 

power  of  W  does  not  lag  appreciably  behind  the  power  of  T  test  which  is  the 

uniformly  most  powerful  invariant  test  for  the  normal  family.  What  is  more 

2 

interesting  ,  the  power  of  W  seems  to  exceed  that  of  T  for  certain  shift 
alternatives  and  certain  correlations  in  the  normal  distribution.  It  is  an 
unusual  situation  where  a  nonparametric  test  seems  to  perform  better  than  the 
best  known  parametric  test  and  certainly  contrasts  with  the  univariate  case 
where  the  power  of  the  Wilcoxon  test  always  falls  short  of  the  t-test  for 
normal  alternatives  since  the  latter  is  UMP  unbiased  (for  numerical  values 
see  [5] ).  This  points  to  the  need  for  further  theoretical  studies  on  the  test  W 
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2.  THE  BIVARIATE  RANK  SUM  TEST  W. 

Let  (Xj,  Yj),  i  =  1,2,...,  m  and  (X^,  Y^),  j  =  m+1,  . . . ,  m+n  be 
independent  random  samples  from  two  bivariate  populations  with  continuous  cdf's 
F(x,y)  and  G(x,y)  respectively.  Consider  testing  the  null  hypothesis  Hq:F  =  G 
against  the  shift  alternataves  G(x,y)  =  F(x-Gp  y-02)  where  (GpO^)  ^  (0,  0). 

The  computation  of  the  test  statistic  W  consists  in  combining  the  two 
samples  together  and  ranking  the  N=m+n  x  and  y  observations  separately 
in  increasing  order  of  magnitude.  The  combined  sample  rank  matrix  thus 
obtained  is  denoted  by 

'  R11  R12  •  •  •  Rlm  :  Rlm+1  *  •  *  R1N 

(1)  Jl  =  I 

\  R21  R22  “-^m  :  R2m+1*  *  *  R2N 


where  the  dotted  line  represents  a  partitioning  of  the  ranks  according  to  the 
first  and  the  second  samples.  The  test  statistic  is  given  by 


(W f-  2q  WjW2  +  W2  ) 


m(N+l)/2  ,  o  =  l,2 
1  N 

V  ru[r21-  (N+D/2]  . 
i=l  11  ^ 


Note  that  Wj  and  W2  are  the  Wilcoxon  rank  sums  in  terms  of  the  X  and  Y 
marginals  and  q  is  Spearman's  rank  correlation  calculated  from  the  entire 
combined  sample. 

The  determination  of  the  rejection  region  involves  partitioning  the  N 
columns  of  into  two  groups  of  m  and  n  in  all  (  ^  )  possible  ways  and 
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recomputing  (2)  in  each  rase.  Let  Wj  <  <  ...  <  W'^  be  the  ordered 

^  m  ^ 

values  thus  obtained.  The  level  a  rejection  region  of  the  permutation  test 

consists  of  the  k  largest  values  of  W'  if  k  =o,(m)  *s  an  integer.  If  k  is 

not  an  integer,  we  have  to  randomize  on  the  boundary  or  slightly  change  a  to 

N 

make  k  an  integer.  In  practice  one  need  not  always  compute  all  the  (  )  values. 

It  is  often  possible  to  recognize  the  column  partitions  which  give  the  largest 
a(^)  values  of  W  by  inspection  of  and  only  a  few  trial  computations. 

W  is  well  defined  except  for  the  case  q  =  +  1.  If  q  happens  to  be  +1, 
we  modify  it  to  q1  =  l-€  where  t  is  a  very  small  number  (the  value  used  here 
is  €  =  .  001).  Similarly  if  q  =  -1,  we  replace  q  in  (2)  by  q’  =  -1  +  t .  This 
in  some  sense  preserves  the  continuity  of  W.  Unless  the  bivariate  distribution 
degenerates  to  a  lower  dimension,  the  occurrence  of  q  =  +  1  has  very  small 
probability  and  hence  such  modification  is  seldom  needed. 

Example.  In  reliability  studies,  it  is  often  of  interest  to  compare  similar 

systems  produced  by  two  competing  manufacturers.  Suppose  a  system  consists 

of  two  dissimilar  components  arranged  in  parallel  so  that  the  system  fails  if  and 

only  if  both  the  components  fail.  Even  though  the  arrangement  is  in  parallel, 

the  assumption  of  complete  independence  of  the  functioning  of  the  components  is 

often  unrealistic.  Thus  we  assume  that  the  failure  times  (X,Y)  of  the  two 

components  have  the  continuous  bivariate  distribution  F(x,y)  for  one  system 

and  G(x,y)  for  the  other  and  consider  testing  of  the  hypothesis  Hq:  F  =  G 

against  G(x, y)  =  F(x- Gp  y-02),  Hotelling’s  T2  test  requires 

the  assumption  not  only  of  bivariate  normality  for  F  and  G  but  also  of  equality 

for  their  covariance  matrices.  Such  assumptions  for  life  distributions  are 

2 

hard  to  justify  and  hence  the  applicability  of  the  T  test  is  dubious.  The  W 
test  on  the  other  hand  requires  no  assumption  on  the  forms  of  F  and  G 
or  on  their  covariance  matrices.  We  illustrate  its  application  by  considering 


the  following  hypothetical  failure  data 


System  I  System  II 


X 

13.  5 

14.  0 

8.6 

13.  3 

13. 

2 

8.  5 

7.8 

8.  3 

y 

3.  5 

18.  7 

4.6 

18.  3 

17. 

8 

16.6 

5.7 

10.  8 

The  combined  sample  rank  matrix  becomes 


00 

r- 

4 

6 

5 

3  1 

2  N\ 

00 

2 

7 

6 

5  3 

4  ) 

Using  (2)  and  (3),  we  have 
q  =  15/42  =  . 35714 
W1  =  7,  W2=  0 
V  =  (l-q2)W  =  49 

Q 

The  nonrandomized  significance  level  not  exceeding  .  05  is  a  =  3/  (  4)  =  .  04  2  8  6 
and  hence  k  =  3.  Since  q  is  constant  for  all  partitions  of  g_  ,  the  rejection 
region  can  be  equivalently  given  in  terms  of  V.  It  would  consist  of  the  three 
largest  values  of  V'  under  all  column  partitions  of  ^  .  By  inspection  and 
a  few  trial  computations,  we  see  that  each  of  the  following  sets  of  four 
columns  of  ^  yield  the  largest  value  of  V'(=  57.  1430): 

f  ( 1 )( 2 )(4 )(5 )1,  f  (3 ),  (6),  (7),  (8)],  [(2),  (4),  (5),  (6)],  [( 1),  (3),  (7),  (8)]. 

The  numbers  within  braces  represent  the  column  numbers  of  when  they 

are  numbered  serially  from  left  to  right.  The  next  lower  value  of  V'  is  the 

observed  value  V  =  49  and  therefore  H  is  accepted  at  a  =  .  04  2  8  6  or 

o 

even  at  a  =  4/(^)  =  .  05714. 
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L  EMPIRICAL  POWER  Or  W  AND  COMPARISON  WITH  T2 

In  this  section,  Monte  Carlo  simulation  is  employed  to  estimate 

the  power  of  the  W  test  for  shifts  in  several  bivariate  normal  distributions 

2 

and  the  estimated  power  is  then  compared  with  the  exact  power  of  the  T  test, 
computed  from  the  tables  of  Tang  [6]  and  Tiku  [7],  Since  both  the  tests  are 
invariant  under  scale  changes  in  x  and  y  coordinates,  without  loss  of 
generality  we  take  unit  variances  in  all  the  normal  distributions.  Four 
evenly  spaced  values  0,  .3,  .6  and  .  9  were  selected  for  the  correlation 
parameter  p. 

There  are  an  enormous  number  of  possible  choices  for  the  vector  shift 

parameter  ^  =  (0,,O2).  For  simplicity  in  computation,  we  consider  two  types 

of  shift:  (a)  a  shift  in  only  one  coordinate,  0^  =  0  and  02  =  0  >0  and  (b) 

equal  amount  of  shift  in  each  coordinate,  0j  =  02  =  0V>  0.  Four  different  values 

are  chosen  for  the  single  shift  parameter  in  each  case.  Since  our  objective 

2  * 

is  to  compare  W  with  T  ,  we  choose  the  values  of  0  and  0  so  that  there 

2 

is  an  even  coverage  of  the  range  of  power  of  T  and  also  so  that  the  power 
2 

T  can  be  read  directly  from  the  tables  [6, 7]  without  interpolation. 

More  specifically,  it  is  known  that  for  the  alternative^  in  a  bivariate 

2 

normal  distribution  with  correlation  p,  the  statistic  (N-3)  T  /2(N-2)  is 
distributed  as  a  noncentral  F  with  (2,  N-3)  degrees  of  freedom  and 
noncentrality  parameter  X  where 

(4)  X2  =  (mn/N)(l-p2)  *(02  -  02)  . 

_  i 

The  power  is  tabulated  in  [6,  7]  for  different  N  and  certain  values  of  <{>  =X3  z. 

For  each  of  the  combined  sample  sizes  N  =  8,  10  and  12,  it  is  found  that  the 

2 

cf>-values  1.  0,  1.  5,  2.  0  and  3.  0  cover  evenly  the  range  of  power  of  the  T 
test.  Thus  fixing  the  values  of  b,  we  determine  the  amounts  of  shift  by 
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specializlng  (4)  to  the  cases  (a)  and  (b)  mentioned  above.  That  is, 
0  and  O'  are  determined  from 


(5) 


7  1 
0  =  .*[  3J(l-p  )  /  mn]  42 

i 

0  =  H  3N  (1  +p)/2mn  ]  2  . 


A  total  of  1,  000  replications  were  made  for  each  of  the  three  sample 

sizes  (m,n)  =  (4,4),  (6,4)  and  (9,  3)  and  for  the  different  alternatives 

mentioned  above.  The  data  were  generated  by  a  uniform  (0,1)  generator 

combined  with  a  certain  transformation.  The  uniform  generator  employed  was 
— 2  6 

{2  Vj  }  where 

,5  .  .  ,2£v 

v1+i=  5  Vjfrnod  2  ) 

26 

with  an  arbitrary  odd  starting  value  vQ  between  0  and  2  .  If  Uj  and  U2 

are  independent  uniform  (0,1)  random  variables  andX  and  Y  are  defined  by 

I  2  -i  1 

X  =  (-21  n  UL)2  sin  (2irU2)+  p(l-p  )  V^nUj)2  cos(2ttU2) 

(6) 

Y  =  (-2fn  UL)2  cos  (2u  U2)  , 

then  (X, Y)  is  bivariate  normal  N(0,  0, 1, 1,  p)(c. f.  Box  and  Muller  [3]  ).  For 

a  specific  (m,n),  N  =  m+n  pairs  (x,y)  were  thus  generated  and  the  last  n  of 

them  were  shifted  according  to  the  two  types  (a)  and  (b).  The  permutation  test 

W  was  applied  with  the  two  nonrandomized  significance  levels  and  «2  which 

envelope  a  -  .  05.  The  rejection  numbers  corresponding  to  and  «2  were 

determined  from  1,  000  replications  and  the  rejection  number  corresponding  to 

a  =  .  05  was  then  interpolated.  The  numbers  entered  in  Tables  1-3,  when 

divided  by  1,  000,  give  the  estimated  powers  of  the  W  test  at  a  =  .  05.  The 

2 

exact  power  of  the  T  test  corresponding  to  specified  values  of  4>  are  read 
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diroctly  from  |6 , 7 )  and  entered  in  the  second  row  of  each  table,  l  or  the 
sample  size  (8,4),  linear  interpolation  in  the  reciprocal  of  the  degrees  of 
freedom  as  discussed  by  Tiku  [7]  was  employed  to  interpolate  power.  The 
rest  of  each  table  exhibits  the  estimated  power  of  the  W  test  for  the  two 
types  of  shift  and  four  values  of  p.  Thus  each  column  of  a  table  presents  the 
empirical  power  of  W  at  different  points  on  a  constant  power  surface  of  the 
T2  test. 

As  a  check  on  the  extent  of  internal  scatter  of  the  estimates  of  power, 
the  1,  000  replications  in  each  case  were  run  in  four  groups  of  250  each.  It 
was  found  that  the  variation  of  the  rejection  numbers  from  group  to  group 
was  quite  small  in  most  cases.  As  is  typical  with  the  simulation  study  on 
a  permutation  test,  the  most  important  contributing  factor  to  the  computer 
time  was  the  generation  d  (^)  column  partitions  of  and  recomputation  of 
the  values  of  W.  The  time  used  for  the  present  job  was  approximately  five 
hours. 

Table  1.  Exact  power  of  T  and  the  rejection  numbers  for  W 
in  1,  000  replications  for  the  two  types  of  shift. 

Sample  size:  m  =  4,  n  =  4;  significance  level  a  =  .  05 


Table  2.  Exact  power  of  T“  and  the  rejection  numbers  for  W 
in  1 ,  000  replications  for  the  two  types  of  shift. 

Sample  size:  m  =  6,  n  =  4,  Significance  level  a-.  05 


0,=0_>0 


1.  0 


*  .  223 


218.  5 


213.  5 


:  185 


149 


6.  5 


.  449 


449.  5 


448.  5 


2.  0 


.  596 


696.  5 


699 


638 


418 


3.  0 


.  963 


948 


945 


920 


745 


■ - 

l  p 

=.6 

i: 

253 

493 

732.  5, 

962.  5 

i  p 

=.  9 

263.  5 

520.  5 

777.  5 

964.  5 

Table  3.  Exact  power  of  T  and  the  rejection  numbers  for  W  in 
1, 000  replications  for  the  two  types  of  shift. 

Sample  size:  m  =  9,  n  =  3  significance  level  a  =  .  ( 


2.  0 


.  74  3 


3.  0 


.  978 


95 


1  216 

1 

417  | 

64  3 

926 

i  189 

344 

521 

838 

0,=0  J>0 


24  3 


4  56 


955 

954 

96? 
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Conskier  also  the  ARC  of  W  with  respect  to  which  for  the 
bivariate  normal  family  considered  depends  on  the  correlation  p  and  the 
two  types  of  shift.  For  shifts  in  the  direction  =  (° j ,  0 2 ),  the  ARE  is 
given  by  (c.f.  Chatterjee  and  Sen  [4]) 


(7) 


eyy.'j>(S^P) 


2 

rr 


d-p2'  <6r  2Poei92+e2  > 

(1-Pq)  of-  2p01e2  +  of  ) 


where  pQ  =  (6/rr)  arc  sin  (p/2).  For  p  =  0,  the  expression  (7)  reduces  to 
3/tr  which  is  the  ARE  of  the  univariate  Wilcoxon  test  relative  to  t-test. 

The  lower  and  upper  bounds  of  (7)  for  all  ^  &  and  all  -1  <  p  <  1  are  .  87 
and  97  respectively  [2]  .  With  a  view  to  investigating  the  extent  to  which 
the  ARE  reflects  the  comparative  power  in  the  present  situation,  we  compute 
its  numerical  values  for  the  two  types  of  shift  and  four  values  of  p  and 
present  these  in  Table  4. 


Table  4.  Values  of  e^.^g^p) 
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4.  COMCLUSIcm 

The  scope  of  the  present  power  study  is  limited  in  that  only  three 
pairs  of  sample  sizes  for  each  of  four  bivariate  normal  alternatives  are  considered. 
Due  to  the  permutational  structure  of  the  W  test,  an  elaboration  of  the  ex¬ 
periment  toward  higher  sample  sizes  and  non-normal  bivariate  alternatives 
would  demand  an  exorbitant  amount  of  computer  time.  However,  within  the 
domain  of  the  present  study,  our  general  observation  is  that  the  power  of  W 
does  not  fall  too  far  behind  that  of  the  best  parametric  test  except  for  the 
type  (a)  shift  occurring  in  the  presence  of  high  correlation.  In  all  cases,  the 
power  of  W  increases  with  increasing  4>  as  does  the  power  of  T^. 

Trom  Table  1,  one  observes  that  the  estimated  powers  of  W  are  somewhat 

2  • 

higher  thai  the  power  of  T  for  the  type  (a)  shift  occurring  in  the  presence  of 

small  p.  Also  from  Tables  1  and  2,  it  appears  that  with  increasing  correlation, 

2 

the  power  of  W  for  type  (b)  shifts  tends  to  appreciably  exceed  that  of  the  T 

test.  Although  one  cannot  reach  too  definite  conclusions  from  empirical 

2 

studies,  the  strong  indication  that  W  has  higher  power  than  T  in  some 
cases  emfhasizes  the  need  for  more  theoretical  studies  on  the  W  test. 
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